cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lauraxyz
by Contributor
  • 6903 Views
  • 5 replies
  • 1 kudos

Put file into volume within Databricks

Hi!  From a Databricks job, i want to copy a workspace file into volume.  how can i do that?I tried`dbutils.fs.cp("/Workspace/path/to/the/file", "/Volumes/path/to/destination")`but got Public DBFS root is disabled. Access is denied on path: /Workspac...

  • 6903 Views
  • 5 replies
  • 1 kudos
Latest Reply
fjrodriguez
New Contributor III
  • 1 kudos

I do have one question, i think this post is the best suitable. I do want to override a wheel files into a Volume i do have already created in my CICD process. I do have something like this:          - ${{if parameters.filesPackages}}:            - $...

  • 1 kudos
4 More Replies
Aneruth
by New Contributor II
  • 805 Views
  • 1 replies
  • 0 kudos

[INTERNAL_ERROR] Cannot refresh quality dashboard

Hi all,I'm encountering an INTERNAL_ERROR issue when refreshing a Databricks Lakehouse Monitoring job. Here's the full error message:`ProfilingError: INTERNAL_ERROR. Please contact the Databricks team for further assistance and include the refresh id...

  • 805 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aneruth
New Contributor II
  • 0 kudos

Thank you! I'll modify my query based on your explanation. Currently, I'm manually parsing the custom metrics output data types, which works but isn't ideal. I'll implement proper data type formatting through asset bundles to ensure the UI receives c...

  • 0 kudos
san11
by New Contributor II
  • 980 Views
  • 2 replies
  • 0 kudos

Enabled IP access list for azure databricks workspace but it is not working

Hi,We enabled IP access list for azure databricks workspace using REST API and we are able to see the IPs in allow and block list but it is not working and we are able to login to Web UI from any IP address and run the queries. Does this approach not...

  • 980 Views
  • 2 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @san11 what is the error you are getting? I mean on any error on web uican you also share the screenshot of the IPs which you allowed on the azure portal?

  • 0 kudos
1 More Replies
HariharaSam
by Databricks Partner
  • 39873 Views
  • 10 replies
  • 4 kudos

Resolved! To get Number of rows inserted after performing an Insert operation into a table

Consider we have two tables A & B.qry = """INSERT INTO Table ASelect * from Table B where Id is null """spark.sql(qry)I need to get the number of records inserted after running this in databricks.

  • 39873 Views
  • 10 replies
  • 4 kudos
Latest Reply
User16653924625
Databricks Employee
  • 4 kudos

in case someone is looking for purely SQL based solution: (add LIMIT 1 to the query if you are looking for last op only)   select t.timestamp, t.operation, t.operationMetrics.numOutputRows as numOutputRows from ( DESCRIBE HISTORY <catalog>.<schema>....

  • 4 kudos
9 More Replies
ajgold
by New Contributor II
  • 1563 Views
  • 6 replies
  • 2 kudos

DLT Expectations Alert for Warning

I want to receive an alert via email or Slack when the @Dlt.expect declaration fails the validation check in my DLT pipeline. I only see the option to add an email alert for @Dlt.expect_or_fail failures, but not for warnings.

  • 1563 Views
  • 6 replies
  • 2 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 2 kudos

Hey @ajgold I don't think DLT has this feature yet. You may raise a feature request for Databricks to add it in its future releases over here - https://databricks.aha.io/Cheers!

  • 2 kudos
5 More Replies
ande
by New Contributor
  • 2557 Views
  • 2 replies
  • 0 kudos

IP address for accessing external SFTP server

I am trying to pull in data to my Databricks workspace via an external SFTP server. I am using Azure for my compute. To access the SFTP server they need to whitelist my IP address. My IP address in Azure Databricks seems to be constantly changing fro...

  • 2557 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Azure Databricks, like many cloud services, does not provide static IP addresses for outbound connections. This is because the compute resources are dynamically allocated and can change over time. One potential workaround could be to use a Virtual N...

  • 0 kudos
1 More Replies
fjrodriguez
by New Contributor III
  • 642 Views
  • 2 replies
  • 0 kudos

Job Preview in ADF

I do have one Spark Job that is triggered via ADF as a usual "Python" activity. Now wanted to move to Job which is under Preview. Normally under linked service level i do have spark config and environment that is needed for the execution of this scri...

  • 642 Views
  • 2 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @fjrodriguez my understanding is You've already created a cluster for Your job. If that's the case, You can put that spark configuration and env variables directly in the cluster Your job is using. If for some reason thats not possible, then You c...

  • 0 kudos
1 More Replies
jdlogos
by New Contributor III
  • 4850 Views
  • 5 replies
  • 2 kudos

apply_changes_from_snapshot with expectations

Hi,Question: Are expectations supposed to function in conjunction with create_streaming_table() and apply_changes_from_snapshot?Our team is investigating Delta Live Tables and we have a working prototype using Autoloader to ingest some files from a m...

  • 4850 Views
  • 5 replies
  • 2 kudos
Latest Reply
jbrmn
New Contributor II
  • 2 kudos

Also facing the same issue - did you find a solution?Thinking I will have to apply expectations at the next stage of the pipeline until this is worked out

  • 2 kudos
4 More Replies
dnz
by New Contributor
  • 1342 Views
  • 1 replies
  • 0 kudos

Performance Issue with OPTIMIZE Command for Historical Data Migration Using Liquid Clustering

Hello Databricks Community,I’m experiencing performance issues with the OPTIMIZE command when migrating historical data into a table with liquid clustering. Specifically, I am processing one year’s worth of data at a time. For example:The OPTIMIZE co...

  • 1342 Views
  • 1 replies
  • 0 kudos
Latest Reply
HimanshuSingh
New Contributor II
  • 0 kudos

Did you got any solution? If Yes please post it.

  • 0 kudos
yuinagam
by New Contributor II
  • 853 Views
  • 2 replies
  • 0 kudos

how can I verify that the result of a dlt will have enough rows before updating the table?

I have a dlt/lakeflow pipeline that creates a table, and I need to make sure that it will only update the resulting materialized view if it will have more than one million records.I've found this, but it seems to only work if I have already updated t...

  • 853 Views
  • 2 replies
  • 0 kudos
Latest Reply
yuinagam
New Contributor II
  • 0 kudos

Thank you for the quick reply.Is there a common/recommended/possible way to work around this limitation? I don't mind not using the expectation api if it doesn't support logic that's based on aggregations.

  • 0 kudos
1 More Replies
shan-databricks
by Databricks Partner
  • 3652 Views
  • 9 replies
  • 4 kudos

Resolved! Databricks Autoloader BadRecords path Issue

I have one file that has 100 rows and in which two rows are bad data and the remaining 98 rows is good data, but when I use the bad records' path, it completely moves the file to the bad records' path, which has good data as well, and it should move ...

  • 3652 Views
  • 9 replies
  • 4 kudos
Latest Reply
ShaileshBobay
Databricks Employee
  • 4 kudos

Why Entire Files Go to badRecordsPath When you enable badRecordsPath in Autoloader or in Spark’s file readers (with formats like CSV/JSON), here’s what happens: Spark expects each data file to be internally well-formed with respect to the declared s...

  • 4 kudos
8 More Replies
yit
by Databricks Partner
  • 5270 Views
  • 8 replies
  • 4 kudos

Resolved! Schema evolution for JSON files with AutoLoader

 I am using Auto Loader to ingest JSON files into a managed table. Auto Loader saves only the first-level fields as new columns, while nested structs are stored as values within those columns.My goal is to support schema evolution when loading new fi...

  • 5270 Views
  • 8 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 4 kudos

@yit awesome. Glad that you got this solved. I look forward to the next problem .All the best,BS

  • 4 kudos
7 More Replies
ZD
by New Contributor III
  • 2406 Views
  • 5 replies
  • 0 kudos

How to replace ${param} by :param

Hello,We previously used ${param} in our SQL queries:SELECT * FROM json.`${source_path}/file.json`However, this syntax is now deprecated. The recommended approach is to use :param instead.But when I attempt to replace ${param} with :param, I encounte...

param.PNG
  • 2406 Views
  • 5 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @ZD Please try this syntax in Your notebook for SQL:%sqldeclare _my_path = 'some_path';select _my_path;  

  • 0 kudos
4 More Replies
Johannes_E
by New Contributor III
  • 1371 Views
  • 2 replies
  • 1 kudos

Resolved! Job cluster has no permission to create folder in Unity Catalog Volume

Hello everybody,I want to run a job that collects some csv files from a SFTP server and saves them on my Unity Catalog Volume. While my personal cluster defined like the following has access to create folders on the volume my job cluster doesn't.Defi...

Johannes_E_0-1752829739526.png Johannes_E_1-1752829991980.png
  • 1371 Views
  • 2 replies
  • 1 kudos
Latest Reply
Johannes_E
New Contributor III
  • 1 kudos

Thank you, that helped although I had to use "SINGLE_USER" instead of "DATA_SECURITY_MODE_DEDICATED". According to the docs (https://docs.databricks.com/api/workspace/clusters/create) "SINGLE_USER" is an alias for "DATA_SECURITY_MODE_DEDICATED".

  • 1 kudos
1 More Replies
Labels