cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

CBL
by New Contributor
  • 1961 Views
  • 1 replies
  • 0 kudos

Schema Evolution in Azure databricks

Hi All -In my scenario, Loading data from 100 of Json files.Problem is, fields/columns are missing when JSON file contains new fields.Full Load: while writing JSON to delta use the option ("mergeschema", "true") so that we do not miss new columns Inc...

  • 1961 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

For these scenarios, you can use schema evolution capabilities like mergeSchema or opt to use the new VariantType to avoid requiring a schema at time of ingest.

  • 0 kudos
TheDataEngineer
by New Contributor
  • 5956 Views
  • 1 replies
  • 0 kudos

'replaceWhere' clause in spark.write for a partitioned table

Hi, I want to be clear about 'replaceWhere' clause in spark.write.Here is the scenario:I would like to add a column to few existing records.The table is already partitioned on "PickupMonth" column.Here is example: Without 'replaceWhere'spark.read \.f...

  • 5956 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

For this style of ETL, there are 2 methods. The first method, strictly for partitioned tables, is Dynamic Partition Overwrites, which require a Spark configuration to be set and detect which partitions that are to be overwritten by scanning the input...

  • 0 kudos
jabori
by New Contributor
  • 3853 Views
  • 2 replies
  • 0 kudos

How can I pass job parameters to a dbt task?

I have a dbt task that will use dynamic parameters from the job: {"start_time": "{{job.start_time.[timestamp_ms]}}"}My SQL is edited like this:select 1 as idunion allselect null as idunion allselect {start_time} as idThis causes the task to fail. How...

  • 3853 Views
  • 2 replies
  • 0 kudos
Latest Reply
MathieuDB
Databricks Employee
  • 0 kudos

Also, you need to pass the parameters using the --vars flag like that: dbt run --vars '{"start_time": "{{job.start_time.[timestamp_ms]}}"}' You will need to modify the 3rd dbt command in your job.

  • 0 kudos
1 More Replies
colospring
by New Contributor
  • 1564 Views
  • 2 replies
  • 0 kudos

create_feature_table returns error saying database does not exist while it does

Hi, I am new on databricks and I am taking the training course on databricks machine learning: https://www.databricks.com/resources/webinar/azure-databricks-free-training-series-asset4-track/thank-you. When executing the code to create a feature tabl...

Capture4.JPG
  • 1564 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

What would be the result if instead of using ' ' you use ` `? 

  • 0 kudos
1 More Replies
ls
by New Contributor III
  • 1193 Views
  • 2 replies
  • 1 kudos

Resolved! Are lambda functions considered bad practice?

As the title suggests I have a bunch of lambda functions within my notebooks and I wanted to know if it is considered to be "bad" to have them in there.output_list = json_files.mapPartitions(lambda partition: iter([process_partition(partition)])) \.f...

  • 1193 Views
  • 2 replies
  • 1 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 1 kudos

Using lambda functions within notebooks is not inherently "bad," but there are some considerations to keep in mind. While this code is functional, chaining multiple lambda functions can reduce readability and debugging capabilities in Databricks note...

  • 1 kudos
1 More Replies
lauraxyz
by Contributor
  • 651 Views
  • 1 replies
  • 0 kudos

Is there a way to analyze/monitor WRITE operations in a Notebook

I have user input as a Notebook, which process data and save it to a global temp view.    Now I have my caller notebook to execute the input Notebook with dbutils.notebook API. Since the user can do anything in their notebook, I would like to analyze...

  • 651 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @lauraxyz, I think you can use the system table and audit logs to achieve that monitoring:  https://docs.databricks.com/en/admin/account-settings/audit-logs.html

  • 0 kudos
greenned
by New Contributor
  • 4459 Views
  • 1 replies
  • 0 kudos

Resolved! not using defined clusters when deploying workflows in development mode by asset bundle

Hi, I'm using databricks asset bundle to deploy workflows.but when i deploy in development mode. the workflows do not use the new clusters, just using existing clusterscan i deploy with the defined new clusters in development mode?? 

greenned_0-1724930366152.png
  • 4459 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

You could use mode: development and then deploy with --compute-id and specify the ID of your personal compute cluster to replace the existing clusters. Only with mode: development will the compute ID replace existing, or per-task cluster specs.

  • 0 kudos
manuel-barreiro
by New Contributor II
  • 1715 Views
  • 5 replies
  • 0 kudos

Unable to view hive_metastore schemas although I have the same permissions as co-workers who can

Hello! I'm having trouble accessing the schemas of the hive_metastore. I have the same level of permissions as my fellow coworkers who don't have any trouble viewing the schemas. Please I would really appreciate it if you could help me with this beca...

manuelbarreiro_0-1736274758836.png
  • 1715 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Where you able to get this issue resolved after looking at the permissions level on your schema and tables?

  • 0 kudos
4 More Replies
yevsh
by New Contributor II
  • 2303 Views
  • 4 replies
  • 0 kudos

UDF java can't access files in Unity Catalog - Operation not permitted

I am using Databricks on Azure.in pyspark I register UDF java functionspark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())Foo tries to load a file,  using Files.readAllLines(), located in the Databricks unity catalog .stderr log:Tue J...

  • 2303 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To address the issue of needing to run initialization code that reads file content during the load of a UDF (User Defined Function) in Databricks, you should avoid performing file operations in the constructor due to security restrictions. Instead, y...

  • 0 kudos
3 More Replies
jeremy98
by Honored Contributor
  • 5248 Views
  • 7 replies
  • 6 kudos

Migrating logic from Airflow DAGs to Databricks Workflow

Hello community,I'm planning to migrate some logics of Airflow DAGs on Databricks Workflow. But, I was facing out to some doubts that I have in order to migrate (to find the respective) the logic of my actual code from DAGs to Workflow.There are two ...

  • 5248 Views
  • 7 replies
  • 6 kudos
Latest Reply
Walter_C
Databricks Employee
  • 6 kudos

You can use Asset Bundles https://docs.databricks.com/en/dev-tools/bundles/index.html 

  • 6 kudos
6 More Replies
Paul92S
by New Contributor III
  • 3543 Views
  • 12 replies
  • 5 kudos

Delta sharing service Issue making requests to Unity System Access tables

Hi all, We have been having an issue as of yesterday which I believe is related to queries against the system.access.table_linage in Unity Catalogs. This issue still persists todayWe get the following error:AnalysisException: [RequestId= ErrorClass=B...

table lineage.png delta sharing issue.png
  • 3543 Views
  • 12 replies
  • 5 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 5 kudos

Thanks team, please let me know if you need any other help!

  • 5 kudos
11 More Replies
jar
by Contributor
  • 1747 Views
  • 8 replies
  • 1 kudos

Databricks single user compute cannot write to storage

I've deployed unrestricted single user compute for each developer in our dev workspace and everything works fine except for writing to storage where the cell will continuously run but seemingly not execute anything. If I switch to an unrestricted sha...

  • 1747 Views
  • 8 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Adding to @saurabh18cs comments, also check if any instance profile attached to the cluster. What is the difference between the clusters, only access mode?

  • 1 kudos
7 More Replies
Anirudh077
by New Contributor III
  • 1420 Views
  • 1 replies
  • 0 kudos

Resolved! Cannot create serverless sql warehouse, only classic and pro option available

Hey teamI am using databricks on Azure(East US region) and i have enabled serverless compute in Settings -> Feature Enablement. When i click on create sql workspace, i do not see serverless option.Any setting i am missing ?

  • 1420 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anirudh077
New Contributor III
  • 0 kudos

I found the root cause for this issue, In Security and Compliance we had PCI-DSS selected and according to this doc we can not have that instead we can select HIPAA

  • 0 kudos
eballinger
by Contributor
  • 2790 Views
  • 4 replies
  • 2 kudos

Resolved! DLT notebook dynamic declaration

Hi Guys,We have a DLT pipeline that is reading data from landing to raw (csv files into tables) for approximately 80 tables. In our first attempt at this we declared each table separately in a python notebook. One @Dlt table declared per cell. Then w...

  • 2790 Views
  • 4 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

Good catch and glad to hear you've identified the source of delay!

  • 2 kudos
3 More Replies
Labels