Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
SELECT '(CC) ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST1, 'A(CC) ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST2, 'A (CC)A ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST3, 'A (CC) A ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST4, 'A ABC (CC)' REGEXP '\\b\\(CC\\)\\b' AS TES...
I'm able to make it to the Permission page of the schema and table I'm trying to do access control on within the Data Explorer page.At first you can only grant permissions but not revoke anything. Only after you have made new grants can you revoke w...
if I manually delete some parque files in location which the real data is stored in, so spark catalog still has the old version. How can I sync them?Thanks!
You just need to create a new table and specify the location of the data for your case it's going to be an ADLS, S3...Example​Create table customer using delta location 'mnt/data./'
I have a custom application/executable that I upload to DBFS and transfer to my cluster's local storage for execution. I want to call multiple instances of this application in parallel, which I've only been able to successfully do with Python's subpr...
Autoscaling works for spark jobs only. It works by monitoring the job queue, which python code won't go into. If it's just python code, try single node.https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling
You can find a rich ecosystem of tools that allow you to work with all your data in-place and deliver real-time business insights faster.This post will help you connect your existing tools like dbt, Fivetran, PowerBI, Tableau or SAP to ingest, transf...
Hello Taha, here is a fairly recent video provided by Databricks on conncecting Power BI : Demo Video: Connect to Power BI Desktop from Databricks - YouTube
Hi All,Hope everyone is doing well.We are currently validating Databricks on GCP and Azure.We have a python notebook that does some ETL (Copy, extract zip files and process files within the zip files)Our Cluster Config on AzureDBX Runtime - 10.4 - Dr...
KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...
I had been trying to upsert rows into a table in Azure Blob Storage (ADLS Gen 2) based on two partitions (sample code below). insert overwrite table new_clicks_table partition(client_id, mm_date)
select
click_id
,user_id
,click_timestamp_gmt
,ca...
How to complete the Databricks lakehouse platform administration for free just like Lakehouse fundamentals. How to get the accreditation for platform administrator like lakehouse fundamentals.
hi @Christy Seto​, i have cleared the lake house exam before 30 november 2022 and was eligible to get a 100 community points , i have cleared with the email id of manpreet.kaur@celebaltech.com but till now i havent get 100 points . i have edited my e...
We are reading over an S3 bucket which contains a several million json files. The schema from the read is stored in a json file in the dbfs filestore. This file is then utilized by autoloader to write new files nightly to a delta table. The schema is...
if anyone is curious I ended up just passing the schema as a string to .schema(eval(the_schema)) in StructType format and not using the file based approach.
I'm trying to port python-sql thrift client to .net and I receive a 500 error when trying to open a session.Is there a way to have an sql warehouse server mock in order to investigate the error.
there is no single answer to this.If you look at parquet, which is a very common format on data lakes:https://parquet.apache.org/docs/file-format/nulls/and on SO
Hello,I have took the azure datasets that are available for practice. I got the 10 days data from that dataset and now I want to save this data into DBFS in csv format. I have facing an error :" No such file or directory: '/dbfs/tmp/myfolder/mytest.c...