cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16826992666
by Valued Contributor
  • 1004 Views
  • 1 replies
  • 0 kudos

Where can I find the tables I created in my Delta Live Tables pipeline?

I created several tables in my DLT pipeline but didn't specify a location to save them on creation. The pipleline seems to have ran, but I don't know where the tables actually are. How can I find them?

  • 1004 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Checkout the configuration storage under settings . If you didn't specify the  storage  setting, the system will default to a location in  dbfs:/pipelines/

  • 0 kudos
User16826987838
by Contributor
  • 1166 Views
  • 1 replies
  • 0 kudos
  • 1166 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Yes, in your write stream you can save it as a table in the delta format without a problem. In DBR 8, the default table format is delta. See this code, please note that the "..." is supplied to show that additional options may be required: df.writeSt...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1955 Views
  • 1 replies
  • 0 kudos

When using Delta Live Tables, how do I set a table to be incremental vs complete using Python?

When using SQL, I can use the Create Live Table command and the Create Incremental Live Table command to set the run type I want the table to use. But I don't seem to have that same syntax for python. How can I set this table type while using Python?

  • 1955 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The documentation at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#mixing-complete-tables-and-incremental-tables has an example the first two functions load data incrementally and the last one loads...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1983 Views
  • 1 replies
  • 0 kudos

Is it possible to disable the maintenance job associated with a Delta Live Table?

After creating my Delta Live Table and running it once, I notice that the maintenance job that was created along with it continues to run at the scheduled time. I have not made any updated to the DLT, so the maintenance job theoretically shouldn't ha...

  • 1983 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You could change the table properties of the associated tables to disable automatic scheduled optimizations. More details at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-language-ref.html#table-properties

  • 0 kudos
User16790091296
by Contributor II
  • 2498 Views
  • 1 replies
  • 1 kudos

Secrets in databricks

I created a secret on databricks using the secrets API.Code :Scope_name : {"scope": "dbtest", "initial_manage_principal":"user"} Resp= requests.post('https://instancename.net/mynoteid/api/2.0/secrets/scopes/create',json=Scope_name)Similar way, I adde...

  • 2498 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

You'll have to specify the scope and the key in the format below to get the value. dbutils.secret.get(scope="dbtest", key="user") Probably a good idea to review the Secret Management documentation for details on how to get this setup the right way - ...

  • 1 kudos
User16826992666
by Valued Contributor
  • 1567 Views
  • 1 replies
  • 0 kudos

Resolved! MLflow Model Serving latency expectations

What kind of latency should I expect when using the built in model serving capability in MLflow. Evaluating whether it would be a good fit for our use case

  • 1567 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

What are your throughput requirements in addition to latency. Currently this is in private preview and databricks recommends this only for low throughput and non-critical applications. However, as it move towards GA, this would change. Please get in...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1159 Views
  • 1 replies
  • 1 kudos
  • 1159 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

val oldestVersionAvailable = val newestVersionAvailable = val pathToDeltaTable = "" val pathToFileName = "" (oldestVersionAvailable to newestVersionAvailable).map { version => var df1 = spark.read.json(f"$pathToDeltaTable/_delta_log/$version%0...

  • 1 kudos
User16826992666
by Valued Contributor
  • 2638 Views
  • 1 replies
  • 1 kudos

Trying to write my dataframe out as a tab separated .txt file but getting an error

When I try to save my file I getorg.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 2 columns.; Is there any way to save a dataframe with more than one column to a .txt file?

  • 2638 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

Would pyspark.sql.DataFrameWriter.csv work? You could specify the separator (sep) as tabdf.write.csv(os.path.join(tempfile.mkdtemp(), 'data'))

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1517 Views
  • 1 replies
  • 1 kudos
  • 1517 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

%scala     display(spark.read.json("//path-to-delta-table/_delta_log/0000000000000000000x.json") .where("add is not null") .select("add.path"))

  • 1 kudos
jason_mcdonald
by New Contributor
  • 1632 Views
  • 2 replies
  • 0 kudos

Is there a way so set DBU or cost limits so I don't get an unexpected bill?

I'm wondering if there's a way to set a monthly budget and have my workloads stop running if I hit it.

  • 1632 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Cluster Policies would help with this not only from a cost management perspective but also standardization of resources across the organization as well simplification for a better user experience. You can find Best Practices on leveraging cluster pol...

  • 0 kudos
1 More Replies
User16826992666
by Valued Contributor
  • 1642 Views
  • 1 replies
  • 0 kudos

What is the default location where dataframes are written if I don't specify a location?

If I save a dataframe without specifying a location, where will it end up?

  • 1642 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

You cant save a dataframe without specifying a location. If you are using saveAsTable API then the table will be created in the hive warehouse location. The default location is user.hive.warehouse

  • 0 kudos
User16826992666
by Valued Contributor
  • 1335 Views
  • 1 replies
  • 0 kudos

Why would I make a deep clone of a Delta table vs reading the table and writing a copy to a new location?

It seems like with both techniques I would end up with a copy of my table. Trying to understand when I should be using a deep clone.

  • 1335 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

A deep clone is recommended way as it holds the history of the table. Also, the DEEP clone is faster than the read-write approach.

  • 0 kudos
User16826992666
by Valued Contributor
  • 1408 Views
  • 1 replies
  • 0 kudos

How can I run OPTIMIZE on a table if I am streaming to it 24/7?

I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?

  • 1408 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases w...

  • 0 kudos
Anonymous
by Not applicable
  • 1664 Views
  • 1 replies
  • 0 kudos

DBFS Permissions

if there is permission control on the folder/file level in DBFS.e.g. if a team member uploads a file to /Filestore/Tables/TestData/testfile, could we mask permissions on TestData and/or testfile?

  • 1664 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

DBFS does not have ACL at this point

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels