cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kumarPatra_07
by New Contributor
  • 4406 Views
  • 1 replies
  • 0 kudos

Resolved! getting short of error while mount to storage account.

while mount to the storage account using this below code dbutils.fs.mount(  source=f"wasbs://{cointainer_name}@{storage_name}.blob.core.windows.net",  mount_point=f"/mnt/{cointainer_name}",  extra_configs={f"fs.azure.account.key.{storage_name}.blob.c...

  • 4406 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @kumarPatra_07 , Greetings! From the above code which you have shared, I could see that you are using a WASBS driver to mount the storage and as of now, WASB is already deprecated.  Reference document :  https://learn.microsoft.com/en-us/azure/dat...

  • 0 kudos
ksenija
by Contributor
  • 3804 Views
  • 1 replies
  • 0 kudos

Log data from reports in PowerBI

Where to find log data from PowerBI? I need to find what tables are being used in my PowerBI reports that are pointing to Databricks. I tried system.access.audit but I'm not finding new data when I refresh my report

  • 3804 Views
  • 1 replies
  • 0 kudos
Latest Reply
Allia
Databricks Employee
  • 0 kudos

@ksenija   To enable ODBC logging in PowerBI, go to C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Spark ODBC Driver folder and create or edit the file microsoft.sparkodbc.ini and update as below:   [Driver] LogLevel=6 LogPath=<...

  • 0 kudos
shahabm
by New Contributor III
  • 15567 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks job keep getting failed due to GC issue

There is a job that running successful but it's for more than a month we are experiencing long run which gets failed. In the stdout log file(attached), there are numerous following messages:[GC (Allocation Failure) [PSYoungGen:...]    and   [Full GC ...

  • 15567 Views
  • 5 replies
  • 2 kudos
Latest Reply
siddhu30
New Contributor II
  • 2 kudos

Thanks a lot @shahabm for your prompt response, appreciate it. I'll try to debug in this direction.Thanks again!

  • 2 kudos
4 More Replies
Freshman
by New Contributor III
  • 4280 Views
  • 4 replies
  • 2 kudos

Resolved! Timezone in silver tables

Hello,What is the best practice in Databricks for storing DateTime data in silver layer tables, considering the source data is in AEST and we store it in UTC by default?Thanks

  • 4280 Views
  • 4 replies
  • 2 kudos
Latest Reply
robert154
New Contributor III
  • 2 kudos

@Freshman wrote:Hello,What is the best practice in Databricks for storing DateTime data in silver layer tables, considering the source data is in AEST and we store it in UTC by default?ThanksThe best practice for storing DateTime data in the Silver l...

  • 2 kudos
3 More Replies
dc-rnc
by Contributor
  • 1492 Views
  • 1 replies
  • 1 kudos

Resolved! if-else statement in DAB YAML file

Hi.Is it possible to use a "better" way to override the "git_branch" key's value on the right file (which is the resource-yaml-file)?Or a different way, like an "if-else" statement. I'd like to have it all in the resource-yaml-file instead of overrid...

dcrnc_0-1738601704946.png
  • 1492 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

I think your approach is correct using targets and ${var.git_branch} variables. At the moment there is no if-else statement, but will investigate internally if there is a better way.

  • 1 kudos
DineshOjha
by New Contributor II
  • 700 Views
  • 2 replies
  • 0 kudos

Read task level parameters in python

I am creating Databricks jobs and tasks using a python package. I have defined a task level parameter and would like to reference it in my script using sys.argv. How can I do that ?

  • 700 Views
  • 2 replies
  • 0 kudos
Latest Reply
DineshOjha
New Contributor II
  • 0 kudos

Thanks, but the link works for notebook. I have a python package run as python wheel and am wondering how to access the parameters. When I run the job its not able to understand the task level parameters in sys.argv 

  • 0 kudos
1 More Replies
garciargs
by New Contributor III
  • 1463 Views
  • 2 replies
  • 4 kudos

DLT multiple source table to single silver table generating unexpected result

Hi,I´ve been trying this all day long. I'm build a POC of a pipeline that would be used on my everyday ETL.I have two initial tables, vendas and produtos, and they are as the following:vendas_rawvenda_idproduto_iddata_vendaquantidadevalor_totaldth_in...

  • 1463 Views
  • 2 replies
  • 4 kudos
Latest Reply
NandiniN
Databricks Employee
  • 4 kudos

When dealing with Change Data Capture (CDC) in Delta Live Tables, it's crucial to handle out-of-order data correctly. You can use the APPLY CHANGES API to manage this. The APPLY CHANGES API ensures that the most recent data is used by specifying a co...

  • 4 kudos
1 More Replies
Anmol_Chauhan
by New Contributor II
  • 1474 Views
  • 4 replies
  • 1 kudos

How to use Widgets with SQL Endpoint in Databricks?

I' trying to use widgets with SQL Endpoints but I'm encountering an error, whereas they work seamlessly with Databricks Interactive Cluster. While query parameters can substitute widgets in SQL endpoints, but I specifically require dropdown and multi...

  • 1474 Views
  • 4 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Got it, let me test, i think there is no specific way to do it, but if you add the option All as the first on the list it should select it

  • 1 kudos
3 More Replies
lukamu
by New Contributor II
  • 1268 Views
  • 3 replies
  • 1 kudos

Resolved! Issue with filter_by in Databricks SQL Query History API (/api/2.0/sql/history/queries)

Hi everyone,I'm trying to use the filter_by parameter in a GET request to /api/2.0/sql/history/queries, but I keep getting a 400 Bad Request error. When I use max_results, it works fine, but adding filter_by causes the request to fail.Example value f...

  • 1268 Views
  • 3 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @lukamu, glad that it worked for you!

  • 1 kudos
2 More Replies
GinoBarkley
by New Contributor II
  • 1151 Views
  • 1 replies
  • 1 kudos

Resolved! Extracting data from GCP BigQuery using Foreign Catalog

On Databricks, I have created a connection type Google Query and tested the connection successfully. I have then created a foreign catalog from the connection to a Google BigQuery project. I can see all the data sets and tables in the Foreign Catalog...

Data Engineering
bigquery
Databricks
location US
  • 1151 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @GinoBarkley, Could you please advise which DBR version are you using in your personal access mode? I see this requirments: Databricks clusters must use Databricks Runtime 16.1 or above and shared or single user access mode. SQL warehouses must ...

  • 1 kudos
hari-prasad
by Valued Contributor II
  • 2870 Views
  • 6 replies
  • 2 kudos

'from_json' spark function not parsing value column from Confluent Kafka topic

For one of badge completion, it was mandatory to complete a Spark Streaming Demo Practice.Due to the absence of a Kafka broker setup required for the demo practice, I configured a Confluent Kafka cluster and made several modifications to the Spark sc...

hariprasad_0-1737534905518.png hariprasad_1-1737534936401.png hariprasad_0-1737533122673.png hariprasad_2-1737534973740.png
  • 2870 Views
  • 6 replies
  • 2 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 2 kudos

I am not sure if I read the full explanation but how about this :     df     .withColumn('value_str', F.decode(F.col('value'), 'utf-8'))    .withColumn('value_json', F.explode(F.from_json(F.col('value_str'),   json_schema)))    .select('value_json.*'...

  • 2 kudos
5 More Replies
Balram-snaplogi
by New Contributor II
  • 1858 Views
  • 2 replies
  • 0 kudos

Not able to Run jobs using M2M authentication form our code

Hi,I am using OAuth machine-to-machine (M2M) authentication with the JDBC approach.String url = "jdbc:databricks://<server-hostname>:443";Properties p = new java.util.Properties();p.put("httpPath", "<http-path>");p.put("AuthMech", "11");p.put("Auth_F...

  • 1858 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Balram-snaplogi ,It looks like permission problem. Could you check if the service principal has the necessary permissions to execute jobs?In Databricks, permissions for jobs can be managed to control access. The following permissions are availabl...

  • 0 kudos
1 More Replies
hedbergjacob
by New Contributor II
  • 2309 Views
  • 2 replies
  • 0 kudos

Resolved! Delta Live Table "Default Schema" mandatory but not editable

Hi,We have an issue with a DLT pipeline. We want to add some source code to an existing pipeline. However, when we save, error message shows that "Default schema" is a mandatory field. However, we are not able to edit the field. The DLT pipeline does...

Data Engineering
deltalivetables
  • 2309 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Does the pipeline settings JSON includes the "schema" field. If you have full admin rights, you can update the existing pipeline settings to include the "schema" field. Like  curl -X PATCH https://<databricks-instance>/api/2.0/pipelines/<pipeline-id>...

  • 0 kudos
1 More Replies
simha08
by New Contributor II
  • 915 Views
  • 2 replies
  • 0 kudos

Unable to Read Collection/Files from MongoDB using Azure Databricks

Hi there,Can someone help to read data from MongoDB using Azure Databricks? Surprisingly, I am able to connect from Jupyter Notebook and read data, but not from the Azure Databricks.1) I have install the required spark-connector packages in the clust...

  • 915 Views
  • 2 replies
  • 0 kudos
Latest Reply
simha08
New Contributor II
  • 0 kudos

I am using following code to read the data from mongoDB using Databricksfrom pyspark.sql import SparkSessionspark = SparkSession \.builder \.appName("myApp") \.config("spark.mongodb.connection.uri", "mongodb+srv://username:password@cluster.xxxx.mongo...

  • 0 kudos
1 More Replies
RamanBajetha
by New Contributor II
  • 890 Views
  • 2 replies
  • 1 kudos

Issue with Generic DLT Pipeline Handling Multiple BUs

We are implementing a data ingestion framework where data flows from a foreign catalog (source) to a raw layer (Delta tables) and then to a bronze layer (DLT streaming tables). Currently, each Business Unit (BU) has a separate workflow and DLT pipeli...

  • 890 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

You can create separate schemas within the same catalog for each BU. For example, you can have schemas like BU1_schema, BU2_schema, etc., within the same catalog. By using Unity Catalog, you can segregate BU-specific tables within the same DLT pipeli...

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels