cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Layer
by New Contributor
  • 860 Views
  • 1 replies
  • 0 kudos

Can i send multiple post requests to an API endpoint and get the info if all succeeded ?

Hello I am trying to send multiple post requests to an endpoint, i have a spark dataframe and each column of this dataframe is sent through the payload of the post request.However when i run this in my notebook, no exception is raised. I'm guessing i...

  • 860 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

The return type for foreachPartition is None, so this is expected. If you're looking to do arbitrary code execution and return a result, mapInPandas or Pandas UDFs are good choices - you'd want to combine those with something like a .toLocalIterator ...

  • 0 kudos
lauraxyz
by Contributor
  • 3678 Views
  • 8 replies
  • 4 kudos

How to execute .sql file in volume

I have giant queries (SELECT.. FROM) that i store in .sql files. I want to put those files in the Volume, and run the queries from a workflow task.I can load the file content into a 'text' format string, then run the query.  My question is,  is there...

  • 3678 Views
  • 8 replies
  • 4 kudos
Latest Reply
lauraxyz
Contributor
  • 4 kudos

issue resolved:for .py, i was using spark, and I have to explicitly create the spark session so that it can be run properly and insert data. 

  • 4 kudos
7 More Replies
100databricks
by New Contributor III
  • 1616 Views
  • 2 replies
  • 1 kudos

Resolved! How can I force a data frame to evaluate without saving it?

The problem in my hand requires me to take a set of actions on a very large data frame df_1. This set of actions results in a second data frame df_2, and from this second data frame, I have multiple downstream tasks, task_1, task_2 ...  By default, t...

  • 1616 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @100databricks,Hi, yes, you can run df_2.cache() or df_2.persist()(df_2.cache() is a shortcut for df_2.persist(StorageLevel.MEMORY_ONLY)Here is the pseudo-code:# df_1 is your large initial DataFrame df_1 = ... # Perform expensive transformations ...

  • 1 kudos
1 More Replies
RobsonNLPT
by Contributor III
  • 4393 Views
  • 2 replies
  • 0 kudos

Delta Identity latest value after insert

Hi all.I would like to know if databricks has some feature to retrieve the latest identity column value (always generated) after insert or upserts operations? (dataframe apis and sql)Database engines as Azure SQL  and Oracle have feature that enable ...

  • 4393 Views
  • 2 replies
  • 0 kudos
Latest Reply
tapash-db
Databricks Employee
  • 0 kudos

Hi, You can always query "SELECT MAX(identity_column) FROM your_table_name" and see the latest value of the identity column. However, there are no direct functions available to give the latest identity column value.

  • 0 kudos
1 More Replies
eballinger
by Contributor
  • 1907 Views
  • 2 replies
  • 0 kudos

Looking for ways to speed up DLT testing

Hi Guys,I am new to this community. I am guessing we have a typical setup (DLT tables, 3 layers - bronze, silver and gold) and while it works fine in our development environment I have always looked for ways to speed things up for testers. For exampl...

  • 1907 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

There isn't a direct way to achieve this within the current DLT framework. When a DLT table is undeclared, it is designed to be removed from the pipeline, which includes the underlying data. However, there are a few strategies you can consider to spe...

  • 0 kudos
1 More Replies
darkolexis
by New Contributor
  • 3803 Views
  • 2 replies
  • 1 kudos

Service Principal types in Azure Databricks

In Azure Databricks, we can create two types of Service Principals, namely:1. Databricks Managed SP2. Microsoft Entra ID Managed SP What is the difference between two, other than one being specific to single workspace, and another being usable from m...

  • 3803 Views
  • 2 replies
  • 1 kudos
Latest Reply
arunprakash1986
New Contributor II
  • 1 kudos

So, what use would it be in a situation where I have a Docker image that runs as a job using Databricks Compute. Here the Job has "Run As" which is set to a service principal, say "svc1" which is a databricks managed service principal. I believe that...

  • 1 kudos
1 More Replies
Cosimo_F_
by Contributor
  • 2507 Views
  • 4 replies
  • 0 kudos

Autoloader schema inference

Hello,is it possible to turn off schema inference with AutoLoader? Thank you,Cosimo

  • 2507 Views
  • 4 replies
  • 0 kudos
Latest Reply
shivagarg
New Contributor II
  • 0 kudos

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/patterns.html#language-pythonyou can enforce the schema or use the "cloudFiles.schemaHints"  to override the Inference. df = spark.readStream.format("cloudFiles") \ .option("...

  • 0 kudos
3 More Replies
aranjan99
by Contributor
  • 10432 Views
  • 4 replies
  • 1 kudos

system.access.table_lineage table missing data

I am using the system.access.table_lineage table  to figure out the tables accessed by sql queries and the corresponding SQL queries. However I am noticing this table missing data or values very often.For eg for sql queries executed by our DBT jobs, ...

  • 10432 Views
  • 4 replies
  • 1 kudos
Latest Reply
goldenmountain
New Contributor II
  • 1 kudos

@aranjan99 did you ever get an answer or conclusion to the limitations of Unity Catalog in regards to tracking access via SQL?

  • 1 kudos
3 More Replies
Direo
by Contributor II
  • 3183 Views
  • 2 replies
  • 0 kudos

Migrating to Unity Catalog: Read-Only Connections to SQL Server and Snowflake

We are in the process of migrating to Unity Catalog, establishing connections to SQL Server and Snowflake, and creating foreign catalogs that mirror your SQL Server and Snowflake databases. This allows us to leverage Unity Catalog’s query syntax and ...

Data Engineering
UnityCatalog SQLServer Snowflake Governance Permissions
  • 3183 Views
  • 2 replies
  • 0 kudos
Latest Reply
goldenmountain
New Contributor II
  • 0 kudos

I’m also trying to figure out if this is a limitation in Unity Catalog. I recently used a JDBC URL to write data to an Amazon Aurora PostgreSQL database, but noticed that no entries appeared in the `system.access.table_lineage` table. Has anyone else...

  • 0 kudos
1 More Replies
Tamizh035
by New Contributor II
  • 3368 Views
  • 3 replies
  • 1 kudos

[INSUFFICIENT_PERMISSIONS] Insufficient privileges:

While reading csv file using spark and listing the files under a folder using data bricks utils, I am getting below error:[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission SELECT on any file. SQLSTATE: 42501File <comma...

  • 3368 Views
  • 3 replies
  • 1 kudos
Latest Reply
mpalacio
New Contributor II
  • 1 kudos

I have the same issue, did you manage to solve it?I have the Databricks extension well configured and my role has enough permissions. Everything used to work propertly but now when I run my notebooks is giving me this issue and 'no module named dbrun...

  • 1 kudos
2 More Replies
brendanc19
by New Contributor III
  • 6327 Views
  • 6 replies
  • 2 kudos

Resolved! Does cancelling a job run rollback any actions performed by query plan?

If I were to stop a rather large job run, say half way thru execution, will any actions performed on our Delta tables persist or will they be rolled back?Are there any other risks that I need to be aware of in terms of cancelling a job run half way t...

  • 6327 Views
  • 6 replies
  • 2 kudos
Latest Reply
fabian_r
New Contributor II
  • 2 kudos

Hi, is there any way to ensure transaction control in delta protocol in 2024 across tables for failing jobs?

  • 2 kudos
5 More Replies
techg
by New Contributor II
  • 1686 Views
  • 4 replies
  • 1 kudos

Missing selection for Parameter error

Hi All,I have created three parameters in an SQL query in Databricks. If no value is entered for a parameter, I would like the query to retrieve all values for that particular column. Currently, I'm getting an error message: "Missing selection for Pa...

  • 1686 Views
  • 4 replies
  • 1 kudos
Latest Reply
techg
New Contributor II
  • 1 kudos

I'm creating this query with parameters in SQL Editor in Databricks and added it to the SQL Dashboard.Do we need to create Widget while creating parameters in SQL Editor? When i tried creating widget in SQL editor, Im getting syntax error near Widget...

  • 1 kudos
3 More Replies
Gusman
by New Contributor II
  • 1157 Views
  • 2 replies
  • 1 kudos

Resolved! Natural language queries through REST API?

Natural language queries provided by Genie are really powerful and a compeling tool.Is there any way to execute these natural language queries through the REST API to integrate them into in-house applications?

  • 1157 Views
  • 2 replies
  • 1 kudos
Latest Reply
stacey45
New Contributor II
  • 1 kudos

@Gusman wrote:Natural language queries provided by Genie are really powerful and a compeling tool.Is there any way to execute these natural language queries through the REST API to integrate them into in-house applications?While there's no direct RES...

  • 1 kudos
1 More Replies
Clara
by New Contributor
  • 599 Views
  • 1 replies
  • 1 kudos

Retrieve data older than the one year window : system.access.table_lineage

Hello,I am currently using table_lineage from system.access.table_lineage. It is a great feature but I am experiencing missing data. After some search I have seen that "Because lineage is computed on a one-year rolling window, lineage collected more ...

  • 599 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Clara ,I don't think so. But you can build history such history tables by yourself. Desing ETL process that will extract data from system tables and store them in your own data tables. 

  • 1 kudos
sboxi
by New Contributor II
  • 782 Views
  • 2 replies
  • 1 kudos

Can we create Materialized view n exsting view and table?

Dear All,Is it possible to create Materialized view on view and table (Joining view and table)?I suspect it is not possible. Please suggest.Also please provide best way to schedule the refresh of Materialized view. Regards,Surya 

  • 782 Views
  • 2 replies
  • 1 kudos
Latest Reply
sboxi
New Contributor II
  • 1 kudos

Thanks @Alberto_Umana . I will try that.

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels